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(54) Audio signal processing 

(57) A speech coder (14) is operable to compress 
digital data representing speech using a Waveform 
Interpolation speech coding method. The coding 
method is carried out on the residual signal from a Lin- 
ear Predicative Coding stage. On the basis of a series 
of overlapping frames of the residual signal, a series of 
respective spectra are found. The evolution of the spec- 
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tra is filtered in a multi-stage filtering process, the fil- 
tered phase data being replaced with the original phase 
data at the end of each stage. This is found to result in 
the decoder (28) being better able to approximate the 
original speech signal. This is of particular utility in rela- 
tion to mobile telephony. 
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Description 

[0001] The present invention relates to audio signal processing, ft has particular utility in relation to the separation of 
voiced speech and unvoiced speech in low bit-rate speech coders. 

[(M)02] Low bit-rate speech coders are becoming increasingly commercially important as they enable a more efficient 
utilisation of the portion of the radio spectrum available to mobile phones. 

[0003] Speech can be dassHied into three parts - voiced speech, unvoiced speech and silence Any one of these may 
. be corrupted by the addition of background noise. On a timescale of milliseconds, voiced speech can be viewed as a 
succession of repeated waveforms. This fact is exploited in a dass of speech coding methods known as Prototype 
Waveform Interpdation (PWI) methods. Essentially, these methods invdve sending information describing repeated 
pitch period waveforms only once, thereby reducing the amount of bits required to encode the speech signal. Inrtial PWI 
speech coding methods only encoded voiced speech, the other portions of the speech signal be coded using other 
methods (e.g. Code Excited Unear Prediction methods). One example of such a hytwrid coding technique is described 
in "Encoding Speech Using Prototype Wavefornre", W.B. Kleijn. IEEE Transactions on Speech and Audfo Processing 
Vol. 1 . PP386-399, October 1 993. 

[0004] Uter PWI methods were generalised so as to enable unvoiced speech and noise to be encoded as well. An 
example of such a method is described in "A General Waveform-Interpolation Structure for Speech Coding". W. B. Kle- 
ijn and J. Haagen, Signal Processing Theories and Applications, M. Hoit. C. Cowan. P Grant. W. Sandham f Eds ) 
P1665-1668, 1994. 

[0005] However, such coders have drawbacks in that the reconstituted speech sounds buzzy The present inventors 
have established that the cause of this touzziness' is a poor separation of the voiced conponents of speech and the 
unvoiced/noisy components of speech. 

[0006] According to a first aspect of the present invention there is provided a method of extracting one of a concordant 
component and a discordant component of a predetennined segment of an audio signal, said method comprising the 
steps of: 

forming an initial evolution surface from a series of combined magnitude and phase spectra representing segments 
of said signal around said predetermined segment; 

modifying said initial evolution surface to obtain a modified evolution surface r^resenting said one of the concord- 
ant component or the discordant component of said signal; and 

extracting said one of the concordant component or the discordant component of said predetermined segment from 
said modified evolution surface; 
wherein said modifying step involves: 

a plurality of component filtering steps and. prior to at least one of those filtering steps, the substitution of phase 
information derived from said initial evolution surface or an earlier one of the component steps for the phase infor- 
mation derived from the most recent component step. 

[0007] Here, concordant is intended to refer to signals whose phase changes slowly in comparison to discordant sig- 
nals whose phase changes more rapidly. 

[0008] The present inventors have found that the rate of evolution of the phase information is useful in distinguishing 
between voiced speech (the concordant component of speech) and unvoiced speech/noise (the discordant component 
of speech). 

[0009] However, it is likely that the invention will find application in other areas of audio signal processing such as the 
enhancement of noise-oorrupted speech or music signals. 

[001 0] Conventional low-pass and high-pass Finite Impulse Response (FIR) digital filtering techniques do not reduce 
the magnitude of discordant and concordant signals respectively to zero. Therefore, they are limrted in how well they 
can extract one of the concordarrt or discordant components of an audio signal. 

[001 1] A conventional FIR filter might be approximated by a series of shorter FIR filters. By decomposing a filtering 
process into a plurality of filtering stages and. in one or more of the intervals between those filtering stages, substituting 
phase information from an eariier stage for phase information from the most recent stage, a filtering process results 
vvhich repeatedly uses the earlier phase information. RItering a signal tends to smooth its phase and hence a filtered 
signal contains less information distinguishing its concordant and discordant parts. By reinstating the earlier phase 
information, the concordant or discordant component can be more thoroughly removed in the subsequent filtering 
stage(s). The result is a audio signal filtering process which is better able to extract a concordant or discordant conco- 
neiTt of an audio signal. 
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1001 2] As suggested above, a repeated application of a low-pass filter will leave a modified evolution surface repre- 
senting the concordant conponent of said predeternnined segment. Preferably, each low-pass f fltering step involves the 
application of an identical low-pass filter. This minimises the complexity of the processing method. 
[001 3] In prefen-ed embodiments, the phase information derived from the initial evolution surface is used in all of said 
component steps. This maximises the effectiveness of the extraction method. 

I [0014] One way in which the discordant component can be calculated is to calculate tiie concordant component 
according to the first aspect of the present invention and subtract this from tiie original signal. Siodlariy. one way in 
which the concordant component can be calculated is to calculate tiie discordant component according to the first 
aspect of the present invention and subtract this from the original signal. 

[0015] According to a second aspect of the present invention, there is provided an audio signal processor operable 
to extract one of a concordant conponent and a discordant conponent of a predetermined segment of an audio signal, 
said apparatus conprising: 

means arranged in operation to form an initial evolution surface from a series of combined magnitude and phase 
spectra representing segments of said signal around said predetermined segment; 

means an-anged in operation to modify said initial evolution surface to obtain a modified evolution surface repre- 
senting said one of the concordant component or the discordant conponent of said signal; and 

means anranged in operation to extract said one of the concordant component or the discordant component of said 
predetermined segment from said modified evolution surface; 
wherein said apparatus further comprises: 

means arranged in operation to can-y out a plurality of filtering steps and, prior to at least one of those filtering 
steps, to substitute phase information derived from said initial evolution surface or an earlier one of tiie component 
steps for the phase information derived from the most recent conponent step. 

[0016] According to a third aspect of the present invention, there is provided a speech coding apparatus including: 

a storage medium having recorded tiierein processor readable code processable to encode input speech data, said 
code including: 

initial evolution surface generation code processable to generate initial evolution surface data conprising combined 
magnitude and phase data for segments of said input speech data; 

separation code processable to derive separate phase data and magnitude data from said input speech data; 
evolution surface modification code processable to generate a modified evolution surface representing one of a 
voiced conponent or an unvoiced/noise component of said input speech data; and 

component extraction code processable to exfract said one of the voiced component or the unvoiced/noise compo- 
nent from said input speech data; 

wherein said evolution surface modification code conprises: 

evolution surface filtering code processable to filter said initial evolution surface data a plurality of times; 
evolution surface decomposition code processable to derive magnitude data and phase data subsequerit to one or 
more of said filtering steps; and 

earfier phase reinstatement code processatjie to replace the phase data obtained on processing said evolution sur- 
face decomposition code with an earfier version of the phase data. 

[0017] According to anottier aspect of tiie present invention there is provided a metiiod of waveform interpolation 
speech coding comprising: 

forming an initial evolution surface from a series of combined characteristic wavefornre or spectra representing 
respective segments of said speech; 

wherein said formation involves aligning each of said characteristic waveforms or spectra v«th an earlier character- 
istic waveform or spectrum of said series; and 

said earlier waveform or spectrum is separated from the characteristic waveform or spectmm to be aligned with it 
by a variable number of members of said series, said variable number varying in accordance with the pitch of said 
signal. 

[0018] It is found that tiie decoded version of unvoiced speech which has passed through a known viraveft>rm interpo- 
lation coder tends to have too high a periodic component. To reduce the urxlesirable periodic component in the output 
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version of unvoiced speech, alignment is made with a characteristic waveform or spectrum that is far enough back in 
the series to have a relatively low numt)er of overlapping samples. 

[0019] There now follows, by way of example only a description of some embodiments of the present invention. The 
emt)odimerrts are descrbed with reference to the accompanying drawings, in which: 

5 

Figure 1 is a schematic illustration of the application of a first embodiment of the present invention to a motwle 
telephony network; 

Rgure 2 shows the processes carried out in an encoder part of a mobile telephone forming part of the network of 
10 Figure 1 ; 



Rgure 3 is a schematic illustration of a spectral evolution surface produced during the operation of the encoder of 
Rgure 2; 



Rgure 3B shows the evolution of an unvoiced speech frequency connponent over time; 

Figure 3C shows the evolution of a voiced speech frequency component over time; 

Rgure 4 Is a flow diagram which illustrates an evolution surface derivation method of prior-art encoders; 

Rgure 5 is a flow diagram which illustrates the evolution surface derivation method of the first embodiment; 

Figure 6 shows the processes canried out by the decoder part of a mobile telephone according to the first embod- 
iment of the present irrvention; and 



Rgure 7 illustrates the reduction of the unvoiced components of the evolution surfece achieved using the method 
of the first embodiment. 



[0020] A mobile telephone network (Rgure 1) operating in accordance with a first embodiment of the present inven- 
tion is operable to allow a first user A to converse with a second user B. User A's mobile phone is operable to transmit 
a radio signal representing parameters modelling user A's speech. The radb signal is received by a base station 17 
which converts it to a digital electrical signal which it fonwards to the Public Switched Telephone Network (PSTN) 20. 
The Public Switched Telephone Network 20 is operated to make a connection between base station 1 7 and a base sta- 
tion 22 currently serving user B. The digital electrical signal is passed across the conneclfon. and. on receiving the sig- 
nal, the base station 22 converts the digital electrical signal to parameters representing user A's speech. Thereafter, the 
base station 22 transmits a radio signal representing those parameters to user B's mobile phone 24. User B's mobile 
phone receives the radio signal and converts it back to an analogue electrical signal which is used to drive a loud- 
speaker 32 to reproduce A's voice. A similar communications path exists in the other direction from user B to user A. 
[0021 ] For each of the radio conwnunication sections, the mobile phone network selects an appropriate bit-rate for the 
parameters representing the user's speech from a full bit-rate (6.7kbrts-^). an intermediate bit-rate (4.6kbits"') and a half 
bit-rate (2.3kbits-^). 

[0022] The signal processing earned out in each mobile phone is now described in more detail. User A speaks into 
the microphone 1 0 of his mobile telephone 1 1 which converts his voice into an analogue electrical signal. This analogue 
signal is then passed to an Analogue to Digital Converter (ADC) 12 which digitises the signal to provide a 64kbits*^ dig- 
itally coded speech signal. A Waveform Interpolation (Wl) encoder 14 receives the digitally coded speech signal and 
reduces it to a 6.7ktMts-^ stream of parameters which r^resent user A*s speech. The param^ers are passed to a quan- 
tiser 16 which is operable to provide a variable rate parameter stream. The quantiser may simply fonvard the full-rate 
parameter stream or. if required, reduce the bit-rate of the parameter stream still further to the intermediate rate 
(4.6kbits*^) or the half-rate (2.3kbits-^). 

j [0023] It will be realised by those skilled in the art that the variable rate parameter stream undergoes further channel 
j coding before being converted to a radio signal for transmission over the radio comnuinication path to the base station 

[0024] User B's mobile phone recovers the variable rate parameter stream and, if required, uses interpolation to gen- 
erate the 6.7ktMts- parameter stream before passing the parameters to a decoder 28. The decoder 28 processes the 
parameter stream to provkJe a digitally coded reconstruction of user A's speech which is then converted to an analogue 
electncal signal by the Digital to Analogue Converter (DAC) 30. which signal is used to drive the loudspeaker 32. 
[0025] The operation of the Wl encoder 14 will now be described in more detail. The encoder 14 of user A's moWIe 
phone receives the digitally coded speech signal from the Analogue to Digital Converter 30 and carries out a number 
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of processes (Figure 2) on the digitally coded speech signal to provide the stream of parameters representing user A's 
speech. 

[0026] The encoder first divides the digitally coded speech signal into 10ms frames. Linear Predictive Coding (LPC) 
techniques (34,36,38) are then used in a conventional manner to provide, for each frame, a set of ten spectral shape 
5 parameters (Line Spectral Frequencies or LSFs) and a residual signal. 

[0027] A pitch period detection process 40 provides a measure (expressed as a number of sample instants) of the 
pitch of the current frame of speech. 

[0028] The residual signal then passes to a waveform extraction process which is carried out to obtain a characteristic 
waveform for each one of four 2.5ms sub-frames of each frame. Each characteristic wav^orm has a length equal to the 

10 pitch period of the signal at that sub-frame. Given that voiced speech normally has a pitch period in the range 2ms to 
18.75ms. it will be realised that the characteristic waveforms will normally overlap one another to a significant degree. 
The residual signal for voiced speech has a sharp spike in each pitch period and the window used to isolate the pitch 
period concerned is movable by a few sample points so as to ensure the spike is not close to the edge of the window. 
Expressed in mathematical notation, the characteristic waveforms are obtained by windowing the residual signal as fol- 

15 lows: 



cw[/>] = res(/c +20/ - ^ - q) where / = 0.1.2.3 and k = 1,2.3 p, Equation 1 

20 

[0029] Where cw[i,k] represents the characteristic waveform for the ith sub-frame and res(x) means the value of the 
xth sample of the residual signal. The pitch period from the pitch detector is p/ and. if required, q is increased from 0 to 
4 in order to shift the spike in the residual away from the edge of the window. 

[0030] The characteristic wavefonms (of length p/) thus extracted then undergo a Discrete Rjurier Transform (DFT) 
25 44 to produce, for each residual sub-frame, a characteristic spectrum. In mathematical notation, the characteristic spec- 
tra (CS) are calculated as follows: 

CS[/.<o] = DFT(civ[/./c].p,) Equation 2 

30 [0031 ] Where CS[i,<o] is a conrplex value associated with a frequency interval co and the ith sub-frame of the residual, 
the complex values for all frequency intervals forming a complex spectrum for the itii sut>-frame of the residual. cw[i,k] 
and Pi are as defined above. 

[0032] The conventional technique of zero-padding is then used to expand the characteristic spectra so that tiiey are 
all 76 values in length. To compensate for the effect of this on the power spectrum, the magnitude part of the charac- 
35 teristic spectra relating to shorter pitch periods is decreased in proportion to the pitch period associated witii the resid- 
ual sub-frame from which it is derived. In mathematical notation: 

1 en 

|CS„o^[/.a>]| = |CS[/,a>]| where 6> = 0.1.2 76 Equation 3 

Pi 

40 

[0033] Where \CSnorm[f'^J\ represents the magnitude (or. in mathematical language, modulus) of the nonrolised 
complex spectral values and \CS[t\a)]\ represents the magnitude of the complex value CS[i,6)] - p, is as defined above. 
[0034] It will be realised that the characteristic spectra are generally obtained from signal segments which overlap at 

45 least the signal segments used in deriving the previous and subsequent characteristic spectra. For voiced speech seg- 
ments, there will be little difference in the magnitude of the conrplex values associated with each frequency irrterval of 
a spectrum and the corresponding magnitude values of the spectra derived from adjacent segments of the signal. How- 
ever, the time offset between the adjacent signals manifests itself as a phase offset between adjacent spectra. In order 
to correct this phase offset the phase spectra (consisting of the phase, or, in mathematical language, argument of the 

so conrplex spectral values) are operated on by alignment process 46. 

[0035] Where tiie pitch period of the signal is long, a large number of samples may be used in calculating both a cur- 
rent spectrum and the spectra on either side. This leads to a similarity between adjacent spectra even in signals that 
are noisy in character. This similarity is undesirable since it reduces the distinction between voiced and unvoiced 
speech. In order to prevent such similarity arising in relation to unvoiced speech/noise each characteristic spectrum is 

55 aligned with another characteristic spectrum which may precede it by a many as four sub-frames. The interval (meas- 
ured in sub-frames) between the characteristic spectra which are aligned with one another increases with increasing 
pitch period as follows: 
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ifp4<90thend= 1 
if90^P4< 105 then cl = 2 
if 105^P4< 125 thend = 3 
ifP4<125thend = 4 

[0036] The alignment process shifts the phase values of one of the characteristic spectra to be aligned until the cor- 
relation between phase values of the two spectra reaches a maximum. The offset that is required to do this provides a 
phase correction for each one of the 76 frequency bins in the characteristic spectrum associated with a given sub- 
frame. The 'aligned* phase values are calculated by summing the original phase values and the phase correction (each 
is expressed in radians). 

[0037] The phase spectrum is then conft)ined with the magnitude spectrum associated with the sub-frame to provide 
an aligned character^ic spectrum for each sub-frame. Expressed mathematically, 

C5^^^^[/>] = |CS^,^[/\e>]|e>^''^-'^''*"' Equation 4 

[0038] Where j Is V^. and ^CSsiignedi,<a] represents the phase value obtained for the frequency interval a> associated 
with the ith sub-frame following the alignment procedure. 

[0039] A normal representation of a spectrum has a series of bars spaced along a frequency axis and representing 
consecutive frequency intervals. The height of each bar is proportional to magnitude of the complex spectral value 
associated with the con-esponding frequency interval. It is possible to visualise a further axis arranged perpendicularly 
to the frequency axis which represents the time at which a spectrum was obtained. Another spectaim derived a time 
interval later can then be visualised aligned with and parallel to the first spectrum and spaced therefrom in accordance 
with the scaling of the time axis. If this process is repeated for several spectra then a surface defined by the tops of the 
bars can be envisaged or computed from the individual magnitudes. 

[OMO] A simplified illustration of such a visualisation of the 'aligned' characteristic spectra output by alignment stage 
46 is shown in Figure 3A (note that the alignment does not alter the magnitudes of the oonrplex values forming the char- 
acteristic spectra and hence Figure 3A equally well represents the normalised characteristic spectra). For ease of illus- 
tration, only 1 1 spectral values are shown, rather than 76 as is actually the case in the embodiment. 
[0041] The so-called evolution* of a spectral magnitude associated with a given frequency interval can be envisaged 
as the variation in that spectral magnitude over spectra derived from consecutive time inten^als. The evolution of the 
magnitude associated with the second lowest frequency interval from time to to t4 in Figure 3A is therefore the succes- 
sion of values VI .V2.V3, V4,V5. 

[0042] As indicated above, the complex spectra in feet contain phase values as well as the magnrtudes associated 
with a given f requ^cy interval. The present inventors have found that an evolution of the complex spectral values asso- 
ciated unvoiced speech is more erratic than an analogous evolution derived from voiced speech. In particular, the 
phase component of the complex value varies more erratically for unvoiced speech. Rgure 3B illustrates how a cornplex 
spectral value derived from unvoiced speech might evolve (the length of the line represents the magnitude, the angle a 
represents ttie phase). Figure 3C shows an evolution likely to be associated with voiced speech. 
[0043] Returning to Figure 2. a Slowly Evolving Spectrum generation process 48 receives the aligned characteristic 
spectra and processes them to obtain a Slowly Evolving Spectrum. Ck)nventionally. this has been done by storing, say. 
seven consecutive spectra and then applying a moving average filter to the evolution of the complex values associated 
with each frequency interval (Rgure 4). Expressed in mathematical notation (here the complex spectral numbers are 
represented in the form of Real and Imaginary parts but the conversion from the Magnitude and Phase representation 
is trivial) 

SES[Ua}] = Z.o„,^€{CS,,.^^^[i + m,0)]} + Y.^,,Zm{CS,,,^Ai + Equation 5 



[0044] Where SES [/.©] represents the con^lex spectral values of a modified spectrum for the ith sub-frame of the 
residual signal and 3^ r^resent the coefficients of the moving average filter. 

[0045] According to the present embodiment, for each sub-frame, a series of operations are can-ied out on stored | 
aligned characteristic spectra including the one associated with the current suthframe and the six respectively associ- 
ated with the six nearest sub-frames (Figure 5). In the first of these operations, a counter is set to zero (step 60). A mov- ' 
ing average filter 62 is then applied to the evolutions of the complex spectral values associated with respective 
frequency intervals to provide a mocfif led spectrum 64 to be associated with the cun^ent sub-frame. 
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[0046] The phase values of the modified spectrum are then replaced (step 66) by the phase values of the aligned 
characteristic spectrum associated with the current sub-frame to provide a hybrid characteristic spectrum 67 associated 
with the current sub-frame. 

[0047] The counter is then increased by one (step 68) and a check is made on the value of the counter (step 70). If it 
5 has not yet reached six then the filtering 62 and phase replacement 66 steps are caried out on the hybrid characteristic 
spectrum just obtained. 

[0048] If the counter has reached six then the magnitude values of the hybrid characteristic spectrum 67 obtained 
after the sixth replacement operation are output by the Slowly Evolving Spectrum generation process (Rgure 2. 48) as 
the Slowly Evolving Spectaim 71 for the cun-ent sub-frame. 
10 [0049] The Slowly Evolving Spectrum (SES) 71 is passed to the Rapidly Evolving Spectrum generation process 50. 
The Rapidly Evolving Spectrum (RES) generation process 50 subtracts the SES magnitude values from the con-e- 
sponding magnitude values of the aligned characteristic spectrum associated with the current sut>-frame to provide the 
magnitude values of the RES. 

[0050] Both the SES magnitude values and the RES magnitude values are then an-anged into Mel-scaled frequency 
IS intervals and the SES magnitude values 52 and RES magnitude values 54 for one out of every two sub-frames are for- 
warded to the quantiser (Rgure 1,16). 

[0051 ] As explained in relation to Rgure 1 . the stream of parameters (pitch 41 . RES magnitude values 54. SES mag- 
nitude values 52. LSFs 37) output by the Wl encoder 14 are received at the decoder 28 in user B's mobile phone 24. 
[0052] The processes canried out in the decoder 28 are now described with reference to Rgure 6. The SES magnitude 
20 values 52 are passed to a phase generation process 80 which generates phase values to be associated with the mag- 
nitude values on the basis of known assumptions. In this embodiment the phase values are generated in the way 
described in the Applicants International Patent Application No. PCT/GB97/02037 published as WO 98/05029. The 
phase values and the SES magnitude values are combined to provide a complex SES characteristic spectrum. 
[0053] The RES magnitude values 54 are combined with random phase values 82 to generate a complex RES char- 
ts acteristic spectrum. 

[0054] Interpolation processes 84.86 are then candied out on the two types of spectra to obtain one spectra of each 
type every 2.5ms. The two spectra thus created are then conrt>ined 88 to provide an approximation to a characte-istic 
spectrum for each sub-frame. The approximate characteristic spectrum is then passed, together with the pitch 41 . to a 
cubic interpolation synthesis process 90 which operates in a known manner to reconstruct an approximation to the 
30 residual signal originally derived in the LSF analysis process in the encoder (Rgure 2. 38). A fitter 92 which is the 
inverse of the analysis filter (Rgure 2. 38) is then used to provide an approximation of the audio signal originally passed 
to the encoder (Rgure 1. 14). 

[0055] Owing to the nature of the decoding process (Figure 6). It is important that the SES magnitude values 52 are 
very low for unvoiced speech. If this is not the case then unvoiced speech components are synthesised In the same way 

35 as voiced components which results in the output speech sounding buzzy 

[0056] In the above-described embodiment of the present invention, the SES generation process (Figure 2. 48) is bet- 
ter ak)le to reduce the SES magnitude values associated with unvoiced speech than the processes used in prior-art P Wl 
encoders. In prior-art coders the erratic evolution of the phase values does result in the low-pass filtering operation (Rg- 
ure 4. 57) reducing the magnitude values of the resultant SEW for the corresponding frequency interval. However, the 

40 present invention improves on this since it gives extra weight to the phase information in the characteristic spectrum (it 
will be recalled that it is the phase information that especially distinguishes unvoiced speech/noise from voiced speech). 
Extra weighting of the phase information is achieved by replacing the phase values at each stage of the iterative filtering 
process and thereby reintroducing the erratic phase values that particularly distinguish voiced and unvoiced speech 
before the next filtering stage. The result is low SES magnitudes associated with unvoiced speech and hence a less 

45 buzzy output than known encoders. 

[0057] The reduction of the SES magnitude with repeated filtering stages is illustrated in Figure 7, It can be seen that 
there is little reduction in magnitude values associated with voiced speech. k>ut that repeated iterations of the filter 
strongly reduce the magnitude values associated with unvoiced speech. 

[0058] In other embodiments, in the SES generation process, the phase values obtained from any earlier filtering 
so stage could be used to replace the phase resulting after a later filtering stage. Such a method would still provide a 
degree of improvement over the prior-art. 

[0059] The above described processes (40. 42. 44. 46) which extract SES magnitude values from the residual signal 
could l>e used to derive a voicing measure for each of the frequency bands for each sub-frame. The voicing measure 
might simply be the ratio of the output SES magnitude to the original characteristic spectrum magnitude for a given fre- 
55 quency Interval. Such a set of processes might be useful in a Multi-Band Excitation speech coder. 

[0060] At the expense of extra processing, the alignment stage 46 might be included within the repeated processes 
contained within the loop illustrated in Rgure 5. This would con^ect any drift introduced by the filtering process. 
[0061] Those skilled in the art will be able to conceive of many different low-pass filters that may be used in the low- 
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pass filtering process 62. 

[0062] In the above enrTbodiment, each of the characteristic spectra corresponds to a single pitch period of the residual 
signal. Instead, the characteristic waveforms could be of a fixed length allowing the use of an efficient Fast Fourier 
Transform (FFT) algorithm to calculate the characteristic spectra. The characteristic spectra might then contain peaks 
and troughs corresponding to the fundamental of the input signal (which, of course, need not be a residual signal). The 
application of the iterative process described in relation to Figure 5 would then retain the peaks but reduce the troughs 
further. Such a method is likely to have application in noise reduction algorithms that might be applied to speech, music 
or any other at least partly periodic audio signals. 

[0063] The improved separation of the spectra representing the unvoiced and voiced speech might also find applica- 
tion in speech recognition devices. 



Claims 



1 , A method of extracting one of a concordant component and a discordant component of a predetermined segment 
of an audio signal, said method comprising the steps of: 

1 forming an initial evolution surface from a series of combined magnitude and phase spectra representing seg- 
/ ments of said signal around said predetermined segment; 

modifying said initial evolution surface to obtain a nx»dified evolution surface representing said one of the con- 
cordant corrponent or the discordant component of said signal; and 

! extracting said one of the concordarrt component or the discordant component of said predetermined segment 
from said modified evolution surface; 
wherein said modifying step involves: 

a plurality of component filtering steps and. prior to at least one of those filtering steps, the substitution of 
phase information derived from said initial evolution surface or an eariier one of the component steps for the 
phase information derived from the most recent conponent step. 

2, A method according to claim 1 wherein said component steps comprise respective low-pass filtering steps whereby 
said modification step provides a modified e\ADlution surface representing the concordant corrponent of said pre- 
determined segment. 

3, A method according to daim 2 wherein each low-pass filtering step involves the afplication of an identical low-pass 
filter. 



\ 



A method according to any preceding daim wherein phase information derived from said initial evolution surface is 
used in all of said component steps. 

A method according to any preceding claim further comprising the step of calculating the other of the concordant 
component and the discordant component by subtracting said one of the two conrtponents from said inrtial evolution 
surface. 



6. A method according to daim 1 wherein said component steps comprise respective high-pass filtering steps 
whereby said modification step provides a modified evolution surface representing the discordant component of 
saki predetermined segment 

7- A method according to claim 1 wherein said audio signal is substantially periodic and each predetermined segment 
represents a different pitch period. 



8. 



A method of se paratin g voiced speech from unvoiced speech and noise, said method comprising tiie steps of any 
preceding claim where said audio signal represents speech and said vcMced speech corresponds to said concord- 
ant component and said unvoiced speech and noise corresponds to said discordant conponent 

9- A method of speech coding conprising the separation method of daimQyyherebv nrwre information is used to code 
the voiced speech than is used to code the unvoiced speech and noise. 
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10. An audio signal processor operable to extract one of a concordant component and a discordant component of a 
predetermined segment of an audio signal, said apparatus comprising: 

means arranged in operation to form an initial evolution surface from a series of combined magnitude and 
phase spectra representing segments of said signal around said predetermined segment: 

means arranged in operation to modify said initial evolution surface to obtain a modified evolution surface rep- 
resenting said one of the concordant conponent or the discordant component of said signal: and 

means arranged in operation to extract said one of the concordant component or the discordant component of 
said predetermined segment from said modified evolution surface; 
wherein said apparatus further comprises: 

means arranged in operation to canry out a plurality of filtering steps and. prior to at least one of those fatering 
steps, to substitute phase information derived from said initial evolution surface or an earlier one of the com- 
ponent steps for the phase information derived from the most recent component step. 

11 . A speech coding apparatus including: 

a storage medium having recorded therein processor readable code processable to encode input speech data, 
said code including: 

initial evolution surface gen^-ation code processable to generate initial evolution surface data comprising com- 
bined magnitude and phase data for segmerrts of said input speech data; 
j separation code processable to derive separate phase data and magnitude data from said input speech data; 
evolution surface modification code processable to generate a nrx)dified evolution surface representing one of 
a voiced conponent or an unvoiced/noise component of said input speech data; and 
conponent extBction code processable to extract said one of tiie voiced conrponent or the unvoiced/noise 
conponent from said input speech data: wherein said evolution surface modification code comprises: 
evolution surface filtering code processable to filter said initial evolution surface data a plurality of times; 
evolution surface decomposition code processable to derive magnitude data and phase data subsequent to 
one or more of said filtering steps; and 

earlier phase reinstatement code processable to replace the phase data obtained on processing said evolution 
surface deconposition code with an earlier version of the phase data, 

12. A method of waveform interpolation speech coding comprising: 

forming an initial evolution surface from a series of combined characteristic wavefornns or spectra representing 
respective segments of said speech; 

wherein said formation involves aligning each of said characteristic waveforms or spectra with an earlier char- 
acteristic waveform or spectrum of said series; and 

said earlier waveform or spectium is separated from the characteristic waveform or spectrum to be aligned with 
it by a variable number of members of said series, said variatjie nunrtoer varying in accordance with the pitch 
of said signal. 
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